Linux Networking

14 posts

When Packets Can't Wait: Comparing Protocols for Delay-Sensitive Data

In Diagnosing Video Stuttering Over TCP, we built a diagnostic framework—identifying zero-window events (receiver overwhelmed) versus retransmits (network problems). In The TCP Jitter Cliff, we discovered that throughput collapses unpredictably when jitter exceeds ~20% of RTT, and the chaos zone makes diagnosis treacherous.

The conclusion from both posts is clear: TCP is inappropriate for delay-sensitive streaming. Its guaranteed-delivery model creates unbounded latency during loss recovery. When a packet is lost, TCP will wait—potentially forever—rather than skip ahead. For a live video frame or audio sample, arriving late is the same as not arriving at all.

But “don’t use TCP” isn’t a complete answer. What should you use? The protocol landscape for delay-sensitive data is vast—spanning media streaming, industrial automation, robotics, financial messaging, and IoT. Each protocol answers the fundamental question differently.

Continue reading →

Protocol Reference: Transport for Data with Deadlines

This is a quick-reference list of protocols for applications where data has a deadline, and late delivery is failure—whether that deadline is 10μs (servo loop), 20ms (audio buffer), or 300ms (video call).

For taxonomy, analysis, and context, see When Packets Can’t Wait. This page is just the inventory—a living reference that grows as protocols become relevant to JitterTrap development.

Continue reading →

The Jitter Cliff: When TCP Goes Chaotic

In Part 1, we used “video over TCP” as a stress test for TCP’s behavior—examining how zero-window events, retransmits, and the masking effect reveal what’s happening inside a struggling connection.

But during those experiments, I discovered TCP throughput degraded rapidly as jitter worsened. While I knew that packet loss would destroy TCP throughput, I hadn’t quite expected the jitter-induced cliff.

At a certain jitter threshold, throughput collapses so severely that measurements become unreliable. Single tests can vary by over 100%. This “chaos zone” makes diagnosis treacherous: the same network conditions can produce wildly different results depending on when you measure.

This post explores TCP’s behavior under jitter and loss, comparing CUBIC and BBR. It’s common knowledge that TCP is inappropriate for delay-sensitive streaming data, and this post will try to demonstrate how and why.

Continue reading →

Diagnosing Video Stuttering Over TCP: A JitterTrap Investigation

Your security camera feed stutters. Your video call over the corporate VPN freezes. The question isn’t whether something is wrong—that’s obvious. The question is what is wrong, because the fix depends entirely on the diagnosis.

Is the problem the sender, the network, or the receiver? These require fundamentally different interventions. Telling someone to “upgrade their internet connection” when the real issue is their overloaded NVR is worse than useless—it wastes time and money while the actual problem persists.

Sender-side problems—where the source isn’t transmitting at the expected rate—are straightforward to detect: compare actual throughput to expected throughput. The harder question is distinguishing network problems from receiver problems when data is being sent. TCP’s built-in feedback mechanisms give us the answer.

UDP is the natural transport for real-time video—it tolerates loss gracefully and avoids head-of-line blocking. But video often ends up traveling over TCP whether we like it or not. VPN tunnels may encapsulate everything in TCP. Security cameras fall back to RTSP interleaved mode (RTP-over-TCP) when UDP is blocked. Some equipment simply doesn’t offer a choice.

The research question driving this investigation: Can we identify reliable TCP metrics that distinguish network problems from receiver problems?

Through controlled experiments, I found the answer is yes—with important caveats. This post builds a complete diagnostic framework covering all three problem types, with the experiments focused on the harder network-vs-receiver distinction. Part 2 will explore what happens when TCP goes chaotic.

Continue reading →

Linux Network Configuration: A Decade Later

In 2014 I wrote about the state of Linux network configuration, lamenting the proliferation of netlink libraries and how most projects hadn’t progressed past shell scripting and iproute2. I concluded that “there is a need for a good netlink library for one of the popular scripting languages.”

A decade later, that library exists. More importantly, the ecosystem has matured enough that every major language has a credible netlink option - and production systems are using them.

To compare them, I’ll use the same example throughout: create a bridge, a network namespace, and a veth pair connecting them.

Continue reading →

The Curious Case of the Disappearing Multicast Packet

Many consumer and industrial devices—from home security cameras to the HDMI-over-IP extenders I investigated in a previous post—are designed as simple appliances. They often have hard-coded, unroutable IP addresses (e.g., 192.168.1.100) and expect to live on a simple, isolated network. This becomes a major problem when you need to get their multicast video or data streams from that isolated segment to users on a main LAN. My goal was to solve this with a standard Linux server, creating a simple, high-performance multicast router without a dedicated, expensive hardware box.

Can a standard Linux server act as a simple, kernel-native multicast router for devices with hard-coded, unroutable IP addresses? This article chronicles an investigation into combining nftables SNAT with direct control of the Multicast Forwarding Cache (MFC).

Continue reading →

Rapido - rapidly creating test VMs for driver development

Thanks to rapido, it’s become much simpler to test Linux device drivers for real PCIe devices in VMs.

The advantages of this approach are:

  • the host is protected from memory corruption errors caused by buggy kernel drivers
  • the PCI peripheral can be physically installed in a multi-use machine, reducing hardware & lab requirements
  • debugging info is easily available
  • the development cycle is short and simple - rapid even :)
Continue reading →

PMTU weirdness

On a good day, my day job involves building networking tools in python. Too many python networking tools look like shell scripts, spawning subprocesses for basic tools like ‘ping’ or ‘ip’ - often resulting in a fragile mess due to poor, or inconsistent error handling.

I was quite excited to find icmplib. It provides a much simpler, less fragile way to do things like reachability tests, RTT measurements, path discovery and path MTU discovery in python code. Hopefully it finds its way into Fedora soon!

Armed with icmplib, I went on a journey of discovery to develop my understanding of Path MTU. I specifically wanted to understand the differences between IPv4 and IPv6, and the effect of VETH, VLAN and Bridge virtual devices.

Continue reading →

DSCP vs Linux socket priorities

I received some encouraging comments on G+ from Jesper Dangaard Brouer about my previous post on DSCP, Linux and VLAN priorities. Those comments and the work linked to (here) points to a few long-standing (but minor) issues with the way DSCP priorities are handled in Linux.

  1. Some DSCP values, like Expedited Forwarding, are not currently (3.17 and earlier) handled correctly.
  2. Linux Priorities, defined in include/uapi/linux/pkt_sched.h, are not documented particularly well, but forms part of the stable interface with userspace. Working with traffic classification (tc), queuing disciplines (qdisc) or VLANs requires at least a basic understanding of Linux socket priorities.
Continue reading →

Looking into DSCP and IEEE 802.1p (VLAN priorities).

I recently discovered a flaw in the VLAN implementation I did at work. It seemed that the normal TCP traffic had the correct VLAN priorities applied, but audio streaming UDP traffic did not.

This was due to DSCP being applied to the streaming audio and the fact that the VLAN device’s egress-qos-map was incorrect.

I had assumed, incorrectly, that VLAN priorities are applied to all traffic as long as we’re not using fancy queuing disciplines (qdiscs). After all, 802.1Q is strictly a layer 2 thing.

Continue reading →

Linux Network Configuration

This concerns the proliferation of netlink libraries and a lack of direction and documentation. Background: I’ve configured a router with netem (see Bandwidth Throttling with NetEM Network Emulation and the tc-netem man page) to test Tieline devices under various delay and loss network conditions. I…
Continue reading →